Graphs are a fantastic method to communicate quantitative results. Compelling and effective graphs are clean, clear and emphasize data above “empty” graphic design. As Edward Tufte (one of the pioneers of data visualisation) wrote, “Confusion and clutter are failures of design, not attributes of information” (E. R. Tufte, Goeler, and Benson 1990, 53). Good data visualisation places data at the forefront, and is grounded in a strong knowledge of statistics.
A standard USSC graph (for R users, theme_ussc()1) has the following elements:
The main content width for the USSC website is 780 pixels (px) wide, which translates to 8.125 inches (in) or 20.636 centimetres (cm). When saving a publication-ready graph, set the width to 8 inches (20 cm or 780px). Most graphs fit in this div. For wide graphs that spans the entire width of the webpage, set the width as 1320 px (13.75 in or 34.925 cm). If the graphs are web-based, the height can vary.
To avoid compression issues that lead to fuzzy fonts and lines, I set the width to be slightly less than the allowed maximum (i.e. 1300 px instead of 1320 px, or 34 cm instead of 35 cm). Play around with different file types (i.e. SVG or PDF instead of PNG). Avoid JPEG files. As a last resort, consider removing titles and captions from the file and adding them into the website manually.2
In statistical graphics, there are three kind of palettes: qualitative, sequential and diverging. The first is used for coding categorical information and the latter two are for numerical or ordinal variables.
Diverging palettes are often used to emphasize the midpoint. The midpoint must be “significant” or worthy of highlighting, for e.g. exam results: if a result >50% is a pass, then <50% must be a failing grade. You should plot exam results using a diverging scale because straight away you can tell who failed or passed. If you plot the same results using a sequential scale, this fact might not be as obvious.
There are two sequential palettes below: blue and greyscale. There is one categorical scale up top. The remainder are diverging scales.
If you have categorical data with more than 6 categories, collapse them! The greater the number of categories, the harder it is to compare them. The folks over at the Urban Institute agree with this advice.
|
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#ED1B35 rib(237, 27, 53) hsl(353, 85%, 52%) |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#8C8C8C rgb(140, 140, 140) hsl(0, 0%, 55%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
|
Light colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Dark colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Blue colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
|
Greyscale colour palette |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
|
Main colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Light colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#765C8C rgb(118, 92, 140) hsl(273, 21%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Dark colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#842A51 rgb(132, 42, 81) hsl(334, 52%, 34%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Blue colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#0E6BA8 rgb(14, 107, 168) hsl(204, 85%, 36%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
|
Greyscale colour palette |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#8C8C8C rgb(140, 140, 140) hsl(0, 0%, 55%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
|
Main colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#097BBB rgb(9, 123, 187) hsl(202, 91%, 38%) |
#4E71A9 rgb(78, 113, 169) hsl(217, 37%, 48%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Light colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#4F71A9 rgb(79, 113, 169) hsl(217, 36%, 49%) |
#9E466F rgb(158, 70, 111) hsl(332, 39%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Dark colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#612f5b rgb(97, 47, 91) hsl(307, 35%, 28%) |
#a72548 rgb(167, 37, 72) hsl(344, 64%, 40%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Blue colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#097bbc rgb(9, 123, 188) hsl(202, 91%, 39%) |
#125a95 rgb(18, 90, 149) hsl(207, 78%, 33%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
|
Greyscale colour palette |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#A1A1A1 rgb(161, 161, 161) hsl(0, 0%, 63%) |
#5D5D5D rgb(93, 93, 93) hsl(0, 0%, 36%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
|
Main colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#0E6BA8 rgb(14, 107, 168) hsl(204, 85%, 36%) |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#765C8C rgb(118, 92, 140) hsl(273, 21%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Light colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#3B7CB7 rgb(59, 124, 183) hsl(209, 51%, 47%) |
#765C8C rgb(118, 92, 140) hsl(273, 21%, 45%) |
#B13B60 rgb(177, 59, 96) hsl(341, 50%, 46%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Dark colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#50315F rgb(80, 49, 95) hsl(280, 32%, 28%) |
#842A51 rgb(132, 42, 81) hsl(334, 52%, 34%) |
#B82243 rgb(184, 34, 67) hsl(347, 69%, 43%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Blue colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#0784C5 rgb(7, 132, 197) hsl(201, 93%, 40%) |
#0E6BA8 rgb(14, 107, 168) hsl(204, 85%, 36%) |
#15518B rgb(21, 81, 139) hsl(209, 74%, 31%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
|
Greyscale colour palette |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#ACACAC rgb(172, 172, 172) hsl(0, 0%, 67%) |
#8C8C8C rgb(140, 140, 140) hsl(0, 0%, 55%) |
#464646 rgb(70, 70, 70) hsl(0, 0%, 27%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
|
Main colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#10619C rgb(16, 97, 156) hsl(205, 81%, 34%) |
#0589CB rgb(5, 137, 203) hsl(200, 95%, 41%) |
#2F83C0 rgb(47, 131, 192) hsl(205, 61%, 47%) |
#8E4E7A rgb(142, 78, 122) hsl(319, 29%, 43%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Light colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#2F83C0 rgb(47, 131, 192) hsl(205, 61%, 47%) |
#5E699D rgb(94, 105, 157) hsl(230, 25%, 49%) |
#8E4E7A rgb(142, 78, 122) hsl(319, 29%, 43%) |
#BD3457 rgb(189, 52, 87) hsl(345, 57%, 47%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Dark colour palette |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
#453362 rgb(69, 51, 98) hsl(263, 32%, 29%) |
#6F2D57 rgb(111, 45, 87) hsl(322, 42%, 31%) |
#99274B rgb(153, 39, 75) hsl(341, 59%, 38%) |
#C32040 rgb(195, 32, 64) hsl(348, 72%, 45%) |
#ED1B35 rgb(237, 27, 53) hsl(353, 85%, 52%) |
|
Blue colour palette |
#009DE3 rgb(0, 157, 227) hsl(199, 100%, 45%) |
#0589CB rgb(5, 137, 203) hsl(200, 95%, 41%) |
#0B75B4 rgb(11, 117, 180) hsl(202, 88%, 37%) |
#10619C rgb(16, 97, 156) hsl(205, 81%, 34%) |
#164D85 rgb(22, 77, 133) hsl(210, 72%, 30%) |
#1C396E rgb(28, 57, 110) hsl(219, 59%, 27%) |
|
Greyscale colour palette |
#CCCCCC rgb(204, 204, 204) hsl(0, 0%, 80%) |
#B2B2B2 rgb(178, 178, 178) hsl(0, 0%, 70%) |
#989898 rgb(152, 152, 152) hsl(0, 0%, 60%) |
#6F6F6F rgb(111, 111, 111 hsl(0, 0%, 44%) |
#383838 rgb(56, 56, 56) hsl(0, 0%, 22%) |
#000000 rgb(0, 0, 0) hsl(0, 0%, 0%) |
Avoid plotting graphs with a dual y-axis. Humans naturally draw their eyes to the difference between the two lines, even when there is no comparison to be made. This is especially egregious when a “smaller” value is greater than the “larger” value because they’re on different scales. More on this from the team at datawrapper, as well as a few pieces from notable statisticians and social scientists. Here’s an academic study detailing why dual y-axis are not ideal.3
Erase non-data ink, within reason. We want the data to shine!
People use statistics and graphs to distort data in misleading ways. How do they do this? By deliberately creating misleading graphs. Because many lack visualisation literacy, this is, quite frankly, a good tactic if you want to convince someone that you’re right above all else. Data, statistics, and graphs all add credence your argument and many, especially those who don’t know statistics, are reluctant to critique quantitative evidence because a) they simply don’t know how to and b) the assumption is made that the author is more statistically knowledgeable than they are because, well, the author made the graph and presumably knows the data far better than the reader.
On the other hand, a major contributing factor to misleading graphs is the innumeracy of data viz creators themselves. The proliferation of tools like Excel, graphic design programs, and other data viz packages lead anyone to think they can create a graph (just like anyone can write the next great American novel!). However, a lot of these people lack basic statistical skills which lead to obvious mistakes. Tufte and others have written on this extensively (E. Tufte and Graves-Morris 1983, p79-87). Indeed, Tufte wrote, “Lurking behind the inept graphic is a lack of judgement about quantitative evidence. … Illustrators too often see their work as an exclusively artistic entreprise. … Those who get ahead are those who beautify data, never mind statistical integrity.” (1983, p79). Still, some of the worst graphs have come from scientists, engineers, computer scientists or economists. It means that there are plenty of examples of truly shocking data viz out there but we should hold ourselves to a higher standard.
Nathan Yau (a prolific statistician) wrote about misleading viz here. Michael Correll, a research scientist at Tableau, and Jeffrey Heer, a professor of data visualization and human computer interaction, recently wrote a piece differentiation between malicious misleading visualizations and just plain incorrect visualizations here.
How do people distort data?
Dual y-axis - see above.
Truncating y-axis on bar charts.
Superfluous 3D charts, especially 3D pie charts.
No scale on axis.
Misleading scales on axis.
Deliberately omitting data.
Using percentages to hide the fact that you have a tiny sample size.
Improper scaled pictograms.
Pie charts: thin slices may be difficult to interpret, people have trouble understanding angles, bar charts are usually better for displaying proportions.
Different colours for identical categories throughout a document - this is especially true when creating small multiples but also exists in other cases. If you compare the US and Australia, keep the colours for both countries consistent throughout for each graph.
In The Visual Display of Quantitative Information4, Edward Tufte devotes a whole chapter to the idea of “graphical integrity”. He writes, “The main defense of the lying graphic is … ‘Well, at least it was approximately correct, we were just trying to show the general direction of change.’ … A second defense of the lying graphic is that although the design itself lies, the actual numbers are printed on the graphic for those picky folks who want to know the correct size of the effects displayed. … Few writers would work under such a modest standard of integrity, and graphic designers should not either.” (E. Tufte and Graves-Morris 1983, p76-77, emphasis my own.) On page 77, he concludes the chapter with some advice:
Graphical integrity is more likely to result if these six principles are followed:
The representation of numbers, as physicallly measured on the surface of the graphic itself, should be directly proportional to the numeric quantities represented.
Clear, detailed, and thorough labelling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graphic itself. Label important events in the data.
Show data variation, not design variation.
In time-series displays of money, deflated and standardized units of monetary measurment are nearly always better than nominal units.
The number of information-carrying (variable) dimensions depicted should not exceed the number of the dimensions in the data.
Graphics must not quote the data out of context.
For example …
Sort your data in a logical (ascending, descending) order before plotting.
Really think about colour. We use colour to differentiate between groups or layers. If all bars measure the same variable, use the same colour. If you want to highlight one group – colour that specific bar a bright colour, leaving all others grey.
The colours in a statistical graphic should cooperate with each other. The typical purpose of colour in a statistical graphic is to distinguish between different areas or symbols in the plot — to distinguish between different groups or between different levels of a variable. This means that there will typically be several colours, or a palette of colours, used within a plot and that those colours should be related to each other. (Zeileis, Hornik, and Murrell 2009, 2)
Direct labels are easier to understand and read. This is really important from an accessibility standpoint. People who are colour-impaired might struggle to match a legend key to a specific point on the graph.
If you have direct annotations, remove extraneous information. Do not keep axis text, ticks, or lines unless necessary.
Remember to remove the y-axis line when you remove the text. Lines add information when we can quantify the distance… Otherwise, they just take up space.
Finally, if these options do not work and you’re looking for more space, flip the chart so that the y-axis is where the x-axis used to be and vice-versa.
Tip: avoid spaghetti graphs by faceting the data or by highlighting a subset of the data.
Plots the relationship between two continuous variables.
In general, the x-axis is the variable and the y-axis is the response.
Choropleth maps are often seen as problematic because geographic areas and population vary in size, so maps might mislead the viewer. Yet, people love choropleth maps. (They are pretty!) Remember to ask yourself whether you’re plotting the effect of said variable or if you’re just plotting the population density.
For more info on the suitability of maps for your data, read Kieran Healey’s explainer.
Tip: normalise your data before plotting.
R.When should you use a pie chart? According to experts, almost never.
A note from Edward Tufte:
A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them, for then the viewer is asked to compare quantities located in spatial disarray both within and between pies. (E. Tufte and Graves-Morris 1983, 178).
Tufte, Edward R, Nora Hillman Goeler, and Richard Benson. 1990. Envisioning Information. Graphics press Cheshire, CT.
Tufte, Edward, and P Graves-Morris. 1983. The Visual Display of Quantitative Information.
Zeileis, Achim, Kurt Hornik, and Paul Murrell. 2009. “Escaping Rgbland: Selecting Colors for Statistical Graphics.” Computational Statistics & Data Analysis 53 (9). Elsevier: 3259–70.
If you use R, feel free to check out this ussc ggplot2 guide.↩
I plan to fix this problem in the near future…↩
It’s actually impossible to create dual axis charts in several data viz packages including data wrapper and ggplot2- Hadley was quite clear about the reason why he hasn’t given users the option do so.↩
Simon has a copy or it can be found in Sydney Uni Library. It is a valuable, insightful (dare I say necessary?) read.↩